Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Templated COSA command execution #1739

Merged
merged 5 commits into from
Sep 30, 2020
Merged

Templated COSA command execution #1739

merged 5 commits into from
Sep 30, 2020

Conversation

darkmuggle
Copy link
Contributor

@darkmuggle darkmuggle commented Sep 22, 2020

This provides template support for COSA commands with the goal of having
COSA understand the RHCOS jobspec. The goal of this work is to provide
templated commands. For example:
cosa entry --spec rhcos.yaml run cosa init "{{ .Recipe.GitUrl}}"

Or:
cosa entry --spec rhcos.yaml runSteps --steps rhcos.steps

Where rhcos.steps has:

    cosa init "{{ .Recipe.GitUrl }}"
    cosa fetch
    cosa build

The way I envision this being implements is via a container where the JobSpec is provided via ConfigMap:

FROM quay.io/coreos-assembler/coreos-assembler:latest
ENTRYPONT ["/usr/lib/coreos/entry"]

@darkmuggle
Copy link
Contributor Author

darkmuggle commented Sep 22, 2020

Throwing this up as an early WIP to show the idea and see if there any interest in pursuing this. This would give us a way to translate the RHCOS Pipeline commands into GoLang templates for command execution and allow for easy migration. The RHCOS job spec isn't particularly loved per se, but its in production use for the ART and Development pipelines.

The other thought I had was replacing dumb-init with this "entry" based on go-init. My thinking is that by having a custom entry point, we could make COSA an OCP Custom BuildConfig image (e.g. oc start-build bc/cosa -env SPEC_URL=https://... )

@darkmuggle
Copy link
Contributor Author

The reason I went with GoLang over Python for this piece of code was because GoLang is way nicer about cleaning up processes and allowing concurrency for parallel tasks is trivial. I created it as it own thing instead of putting into Mantle directly since it doesn't fit neatly there.

@jlebon
Copy link
Member

jlebon commented Sep 23, 2020

Would it make sense to just make the steps file a shell script instead? So essentially, cosa entry would just render the script using the provided spec, and then os.exec into it.

@darkmuggle
Copy link
Contributor Author

darkmuggle commented Sep 23, 2020

Would it make sense to just make the steps file a shell script instead? So essentially, cosa entry would just render the script using the provided spec, and then os.exec into it.

I thought about using os.exec instead of bringing in a fork of go-init but:

  • COSA has a history of defunct/zombie's hanging around. We have most of those handled, so this not a compelling reasons in and of itself.
  • The ability to use the templates from the CLI is compelling, especially for doing tests and running commands that are normally run via the pipeline on the CLI.
  • We have a concurrency-safe place to corrdinate parallel steps (feature not implemented) and ensure that defunct/zombie's are reaped.

Also, since this doesn't have terminal emulation, its not suitable for interactive developer hack-benches. To be sure, I can see developers entering cosa and running entry run-steps hack.steps.

@cgwalters
Copy link
Member

OK a whole lot going on here.
This seems like it touches on #1668 - it'd be really nice notably to not have two vendor/ directories at least.

We have a concurrency-safe place to corrdinate parallel steps (feature not implemented) and ensure that defunct/zombie's are reaped.

There are a lot of "execute these things in parallel" languages, among them the venerable make to ninja etc. I think we need to grow some concept of dependencies, and it'd be better to reuse an existing tool that has all the niceties like topological sorting, a CLI that displays progress, integration with a jobserver etc. WDYT then about we have cosa build --spec jobspec.yaml that then turns around and generates e.g. a ninja file which then in turn re-executes some of the cosa commands, but honoring dependencies etc.?

@darkmuggle darkmuggle added design design discussion ok-to-bikeshed Permission is granted to bikeshed labels Sep 23, 2020
@darkmuggle
Copy link
Contributor Author

darkmuggle commented Sep 23, 2020

This seems like it touches on #1668 - it'd be really nice notably to not have two vendor/ directories at least.

Partly. The base idea came before #1688; the code idea never saw the light of day, though. I went ahead and dusted this off because I am angling to have COSA run as a build containers via OCP buildconfig and saw that I needed COSA to run more than a single command at once.

👍 on having vendor/ be common.

WDYT then about we have cosa build --spec jobspec.yaml that then turns around and generates e.g. a ninja file which then in turn re-executes some of the cosa commands, but honoring dependencies etc.?

I haven't really looked at Ninja, tbh. I can look. But I was avoiding cosa build specifically to avoid teaching it about the spec or even supporting variant specs.

@darkmuggle
Copy link
Contributor Author

Okay, after looking at #1739 (comment), I think that this approach is still valid. Using existing tools is fine, but ninja and Makefiles really doesn't solve the problem that I was tackling. And this really goes back why coreos/enhancements#1 was filed (and closed because the idea is aweful) -- we need a file-based interface into COSA that is not a CLI.

The genesis to this PR comes from the realization that the real difference between the RHCOS and FCOS pipelines is the business logic. And when I looked deeply at the domain, it becomes clear that the bulk of the Jenkins logic amounts to rendering configuration into CLI arguments and setting up the environment. My position is that giving COSA a file interface will simplify the Jenkins pipelines significantly.

I think we need to grow some concept of dependencies, and it'd be better to reuse an existing tool that has all the niceties like topological sorting, a CLI that displays progress, integration with a jobserver etc.

I agree with understanding dependencies, and I've done some work in that regard. But, I think dependency tracking is orthogonal to this idea; a dependency tree may negate the need to run "stages" but that doesn't negate the sugar of having templating.

Integration with another tool would be a big undertaking, especially if we want a pretty CLI experience. However, integration for the Developer use-case might look different than integration for production use-cases (e.g. Jenkins, FedMsg, OpenShift pods).

@jlebon
Copy link
Member

jlebon commented Sep 25, 2020

The other thought I had was replacing dumb-init with this "entry" based on go-init. My thinking is that by having a custom entry point, we could make COSA an OCP Custom BuildConfig image (e.g. oc start-build bc/cosa -env SPEC_URL=https://... )

I thought about using os.exec instead of bringing in a fork of go-init but:

Hmm, I'm still not sure I follow why we need to fold in go-init. It's only relevant when running cosa as a container, and we handle it there by wrapping the entrypoint with dumb-init, right? E.g. in my pet container workflow, I already have catatonit running as pid 1, so I wouldn't need cosa to have its own. (And ideally eventually this will be built into k8s.)

The genesis to this PR comes from the realization that the real difference between the RHCOS and FCOS pipelines is the business logic. And when I looked deeply at the domain, it becomes clear that the bulk of the Jenkins logic amounts to rendering configuration into CLI arguments and setting up the environment. My position is that giving COSA a file interface will simplify the Jenkins pipelines significantly.

Can you sketch out a bit more what the Jenkins pipeline would look like? It's worth noting that this is introducing two file interfaces: the spec file and the template file. I agree that a spec file would be useful (WDYT about calling this e.g. a property file instead so we don't confuse it with RPM spec files?), and agree that the differences between the pipelines mostly lie in the business logic, but I'm having trouble reconciling how templates fit into the current flow. IIUC, doesn't it imply that whenever we want to do something at the Jenkins layer (e.g. setting the job description, marking a new stage, spawning k8s worker pods for parallel work, starting other jobs, etc...) we would have to keep going in and out of template scripts? Or would all that stuff be obsolete in this new world?

@darkmuggle
Copy link
Contributor Author

darkmuggle commented Sep 25, 2020

Hmm, I'm still not sure I follow why we need to fold in go-init. It's only relevant when running cosa as a container, and we handle it there by wrapping the entrypoint with dumb-init, right? E.g. in my pet container workflow, I already have catatonit running as pid 1, so I wouldn't need cosa to have its own. (And ideally eventually this will be built into k8s.)

I'm backing away from the idea of replacing dumb-init with entry as the default. Rather it would be used in the CI/Pod scenarios. Pulling in the go-init was done to handle cleanup of zombies that I have seen in my own pet container.

Can you sketch out a bit more what the Jenkins pipeline would look like? It's worth noting that this is introducing two file
interfaces: the spec file and the template file.

The world that I envision is COSA-as-a-Service where COSA is an OpenShift BuildConfig. Rather than relying on Jenkins to setup the pod, Jenkins would execute oc start-build bc/cosa --env.... --follow=true.

Why? Right now the RHCOS pipelines are greedy. The entire pipeline consumes a lot of RAM and a lot of disk space for a build and is scheduled on the same pod. In the RHCOS pipeline, we require privileges, which requires to play funny games to get /dev/kvm) and the whole pipeline runs as privileged; this is unneeded and schedules all our workloads on one pod. We could break it up and get the pods scheduled where needed. But in all reality using Jenkins podTemplates is inefficient here since each new pod gets a new agent container too. The other problem is that we need to have a Jenkins container and know what that container is, in all environments that we want to launch a pod. In the RHCOS world, we have found Agent's are a bit finicky and timeout, lose their connection, and because all pipelines are resource spendy, they get evicted.

Now, why templating when OpenShift has templates? One of the lessons learned in the RHCOS pipelines was that we ended up polluting the namespaces with essentially the same thing with minor variations. The idea of using templates is to re-use the same buildconfigs based on versions (ala bc/cosa-{4.[3-6], latest} and bc/cosa-priv-{4.[3-6, latest}).

The reason that envVar's don't work well here is Jenkins right now set the envVars. And don't get me started on truthiness in envVar strings. The RHCOS pipeline's complicated nature comes from the processing of the envVar's. OpenShift templates are a real PITA to update when new options are needed. I originally came up with the idea of having COSA support envVar's for CLI commands; that idea excited people about as much as root canal. When when I looked at the BuildConfig spec, you don't have the CMD and ENTRYPOINT interfaces. The only possible way to have COSA as a buildConfig is to have an entrypoint with an interface for files and/or the buildConfig spec.

The other big reason I wanted to move away from Jenkins as the pod scheduler is because of multi-arch. Right now we have several down-stream teams that take the RHCOS pipeline and then standup Jenkins and run parallel pipelines. The cost of standing up builders is expensive. By having the interfaces this provides, we can run COSA anywhere we can run a container (e.g. ssh me@there 'podman run --rm --entrypoint /usr/bin/entry...).

but I'm having trouble reconciling how templates fit into the current flow.

I would not reconcile it, tbh. This is future looking. Templates give us nothing today.

Consider a distributed build system on aarch64, x86_64, and ppc64el. To start a build in a remote pod via a buildConfig, COSA needs to execute several steps.

# build.steps
cosa buildprep {{ .S3.Url }}
cosa init {{ .Recipe.GitUrl}
cosa fetch
cosa build --skip-prune
cosa buildupload s3 --acl={{ .S3.Acl} {{ .S3.Url }}

And for testing:

aws s3 get s3://{{ .S3.Url }}....
kola run-upgrade --ignition-version v2 -b rhcos...

(A post-commit hook would fetch the artifacts out)

Follow? Now, to @cgwalters's point about build dependencies (i.e why not have COSA understand the dependencies?) is that build dependencies assume that dependencies are in the same place. What if the dependency exists, but needs fetching?

we would have to keep going in and out of template scripts? Or would all that stuff be obsolete in this new world?

Not sure I follow. I think what this will look like is a Jenkins file that looks like:

       stage("build") {
           parallel x86: {
                cosa_oc("creds-x86", "build.steps")
          } aarch64: {
                cosa_oc("creds-aarch64", "builds.steps")
         }

Where build.steps is the example above and cosa_oc is a helper function to run oc start-build bc/cosa --env... --from-file=builds.steps.

Jenkins will still give us the pretty UI, we'll still get the logs.

Finally, the other angle that this gives us is something really slick for the Developer use case. One common pain-point (for those of us unlucky enough to be on Comcast/Spectrum/$EvilCable that only gets 25Mb up) is how miserable it is to upload to a cloud to test your changes, and that assumes you have all the secrets. By having COSA as a buildconfig with this interface, I want to bring a cosa remote command that launches your local {src,overrides} and /usr/lib/coreos-assembler in a remote on the OpenShift of your choosing.

Imagine:
cosa remote --cred ocp.yaml --stages build-upload-to-gcp.steps

Where build-upload-to-gcp.steps has:

cosa buildprep {{ .S3.Url }}
cosa init {{ .Recipe.GitUrl }}
cosa fetch
cosa build --skip-prune
cosa buildextend-gcp --upload....

@darkmuggle darkmuggle changed the title WIP: Templated COSA command execution Templated COSA command execution Sep 28, 2020
@darkmuggle
Copy link
Contributor Author

Okay, I think this is ready for review.

At the high level, I'm electing to keep calling the JobSpec the "JobSpec"; we have prior art in the existing JobSpec, and in the existing pipeline. And at the end-of-the-day, I disagree that someone will confuse an RPM spec with Jobspec. Should we incept a generic build "properties" file, we can add a "property."

Finally, as stated, this is a starting point for distributed builds.

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me but I'd like to see some some info put into a markdown doc so folks who see this for the first time get a sense of how to use it and/or what it is for.

@@ -0,0 +1,161 @@
package spec

type Aliyun struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In the future I'd love to see these structs with go docs added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little bit more than doc string 'em. I removed the Jenkins Specific pieces.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general I don't understand why we aren't just templating a set -euo pipefail shell script basically. The only slightly tricky thing there is to handle escaping arguments but...ah yeah we have shellquote.Join vendored in mantle already.

BTW I created https://crates.io/crates/sh-inline to do something similar to this for Rust to use in ostree's test suite - it's not quite the same because that's more about inline whereas this seems more about having the templates maintained outside of the code, but some similar ideas.

entrypoint/cmd/entry.go Outdated Show resolved Hide resolved
// cosaDir is set during a complile setting.
if cosaDir == "" {
path, err := os.Getwd()
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...os.Getwd() can only fail I think in situations where the current directory is deleted underneath the process...why are we trying to handle that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

being pendantic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

being pendantic

@darkmuggle To clarify do you mean that you're being pedantic with extra checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I'm being pedantic with the extra check. It doesn't hurt anything. I definitely wasn't saying that Mr. Walters was being pedantic in the review (which is one way to read my poorly formed sentence fragment.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! To clarify I did read it the first way initially.

But my higher level uncertanty isn't checking for errors - that's always good IMO (now that I look I phrased this badly) so I more meant why are we silently falling back instead of propagating the error? A whole lot of tools out there will fail if their current working directory is deleted - usually this is a user error.

IOW why not just:

if cosaDir == {
    path, err := os.Getwd()
    if err != nil {
        return err
    }

Not a big deal obviously, just one of those detail things that can be good to have a rough consensus on since the pattern tends to reoccur elsewhere.

"time"
)

func RemoveZombies(ctx context.Context, wg *sync.WaitGroup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally be more in favor of adding an assertion that we're not pid 1 (i.e. require that the caller has used --init or dumb-init in Kubernetes) and documenting how to do that rather than also carrying our own init. But not a blocker.

entrypoint/exec/go_init.go Outdated Show resolved Hide resolved
}

// Wait for command to exit
err = cmd.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this "start + wait + exit" pattern I think should always raise the question of why we're not just directly exec()-ing the child process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run-steps can run N files; we wait to see if that the command is done.

@cgwalters
Copy link
Member

Forgot to add though I definitely agree with the argument that we should move some "high level steps" into cosa itself. We have a tension here between logic shared in groovy (e.g. https://github.com/coreos/coreos-ci-lib/ ) and this is effectively arguing for draining some of the groovy - which I think makes sense. It'd probably help a lot to understand this PR if e.g. it also added some steps/templates and then switched the .cci.jenkinsfile CI run to use those or so?

Ben Howard added 3 commits September 28, 2020 19:28
This provides template support for COSA commands with the goal of having
COSA understand the RHCOS jobspec. The goal of this work is to provide
templated commands. For example:
   cosa entry --spec rhcos.yaml run cosa init "{{ .Recipe.GitUrl}}"

Or:
   cosa entry --spec rhcos.yaml run-steps rhcos.steps

Where rhcos.steps has:
    cosa init "{{ .Recipe.GitUrl }}"
    cosa fetch
    cosa build

The alternative entry is used when the envVar `COSA_CI_ENTRY=1`.
@darkmuggle
Copy link
Contributor Author

darkmuggle commented Sep 29, 2020

and this is effectively arguing for draining some of the groovy - which I think makes sense.

To be blunt: this is an understatement. My core argument is that the Groovy is actually a liability on the RHCOS side. I think I make the case for what I'm arguing for and why in the added README.md -- and the vision I want to push.

@@ -0,0 +1,145 @@
# Entrypoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic write up! One nit which I do not think is worth looking at in this PR itself is I believe it would be best to have the "what and how to use" up front followed by the "why". I believe more people will be interested in what it is and how to use it before they end up asking what brought us from where we are today to this pattern.

Signed-off-by: Ben Howard <[email protected]>
@jlebon
Copy link
Member

jlebon commented Sep 29, 2020

This overall looks sane to me. I still don't quite understand why this isn't just another mantle pkg so we share the same vendor directory, and also why we're bundling init functionality instead of just re-using dumb-init which is already in the container image, but we can resolve those in follow-ups.

/approve

@darkmuggle
Copy link
Contributor Author

This overall looks sane to me. I still don't quite understand why this isn't just another mantle pkg so we share the same vendor directory, and also why we're bundling init functionality instead of just re-using dumb-init which is already in the container image, but we can resolve those in follow-ups.

@jlebon I hope that the next PR will help to answer the init function question.

Regarding making this another Mantle command, I did consider it -- and I even considered a shared vendor directory. There is some disagreement about the vendoring approach.

Mantle has a clear purpose in code in doing $THING with an artifact (run, test, publish), while this new command is more meta about running COSA itself. And I did ask two other people about whether I should put this in Mantle; the consensus was no.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
This can't really break anything right now, so I'm fine moving forward and addressing more things as followups. In particular I think the time where we try to replace parts of the pipeline should be revealing.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, darkmuggle, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ashcrow,cgwalters,darkmuggle,jlebon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit fe9ae78 into coreos:master Sep 30, 2020
@darkmuggle darkmuggle deleted the pr/cosa-batch-template branch March 9, 2021 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved design design discussion lgtm ok-to-bikeshed Permission is granted to bikeshed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants